Statistics - Hypothesis Testing (1)
Table of Contents
This article explains methods for integrating multiple datasets into one using the pandas library.
In statistics, a Hypothesis is a proposition that represents a claim or estimation and signifies an assumption/tentative conclusion about parameters.
Types of Hypotheses #
Hypotheses can be represented in two forms as follows:
1. Null Hypothesis (H0) #
The null hypothesis represents the hypothesis that there is no change or difference compared to the original, serving as a kind of ‘default’ hypothesis.
The content of the null hypothesis varies depending on the test method. For example, a claim such as “the means of two groups are equal” can be set as a null hypothesis.
2. Alternative Hypothesis (H1) #
The alternative hypothesis is a claim opposing the null hypothesis, representing a hypothesis that one seeks to prove with concrete evidence through samples. For example, a claim such as “the means of two groups are different” can be set as an alternative hypothesis.
Through statistical testing, it is decided whether to reject the null hypothesis using the given data, or if there is no basis for rejection, the null hypothesis is not rejected. If clear evidence that the alternative hypothesis is true is found in the test results, the null hypothesis is rejected.
The process of verifying the validity of such hypotheses in statistics is precisely hypothesis testing.
Hypothesis Testing #
1. Setting the Hypothesis #
The first step of hypothesis testing is setting the null hypothesis (H0) and the alternative hypothesis (H1) according to the problem being investigated.
2. Sample Analysis #
Next, a sample that can represent a part of the entire population is extracted. Data is collected and analyzed for this sample, thereby securing materials for statistical analysis.
3. Testing the Validity of the Hypothesis #
The hypothesis is tested using the collected data. It is decided whether to reject the null hypothesis or accept it because there is no basis for rejection, considering the level of significance and the test statistic.
Level of Significance
The level of significance is usually denoted by α(alpha) and represents the criterion probability for rejecting the null hypothesis in experiments or surveys.
The commonly used level of significance is 0.05 (5%), but other values such as 0.01 or 0.10 can be used depending on the nature of the experiment or characteristics of the research.
Test Statistic
The test statistic is an indicator that measures how well the collected data matches the hypothesis, a sample statistic necessary for parameter inference. The test statistic plays a crucial role in hypothesis testing and is used to decide on the rejection of the null hypothesis.
In the process of testing a hypothesis, there is always a possibility of statistical error, referred to as hypothesis testing error.
Hypothesis Testing Error #
1. Type I Error #
A Type I Error refers to the error of rejecting the null hypothesis when it is true. The cause of a Type I error is setting the significance level in statistical testing, which accidentally occurs when rejecting the null hypothesis at this level.
Example: Incorrectly concluding there is an effect when there is actually none
2. Type II Error #
A Type II Error refers to the error of accepting the null hypothesis when the alternative hypothesis is true. The cause of a Type II error is insufficient test power, occurring when the actual effect is not detected due to a lack of power.
Example: Failing to find the effect in statistical testing and adopting the null hypothesis even though there is actually an effect